Background

Motivation

Toowoomba Flash Flooding


What was the probability of this flash flood occurring?

El Niño Southern Oscillation (ENSO)





Did ENSO affect the probability of this flash flood occurring?

Thesis Overview

1. Quality Assurance

Thesis Overview

1. Quality Assurance


2. Non-stationarity of Extremes

Thesis Overview

1. Quality Assurance


2. Non-stationarity of Extremes



3. Extremes in Continuous Space with Dependence

Thesis Overview

1. Quality Assurance


2. Non-stationarity of Extremes


3. Extremes in Continuous Space with Dependence



4. Extremal Dependence

Extreme Value Theory

Standard Formulation

Let \(X_i\) be a sequence of iid random variables, define \[M_{n} = \max\{X_1, \dots, X_{n}\}.\]

The distribution function of \(M_n\) is \[\mathbb{P}(M_n \leq z) = \mathbb{P}(X_1 \leq z, \dots, X_n \leq z) = \mathbb{P}(X \leq z)^n = F(z)^n,\] where \(F(z)\) is the distribution function of \(X\) .

Let \(z^F\) denote the right endpoint of the support of \(F\) , \[z^F = \sup \{z : F(z) < 1\},\] then as \(n \rightarrow \infty\), \(F(z)^n \rightarrow 0\) for any \(z < z^F\).

GEV Distribution

If there exists sequences of constants \(\{a_n\} > 0\) and \(\{b_n\} \in \mathbb{R}\) such that \[\mathbb{P} \left\{\dfrac{M_n - b_n}{a_n} \leq z \right \} \rightarrow G(z) \quad\hbox{as}\quad\, n \rightarrow \infty \] where \(G(z)\) is a non-degenerate distribution function, then \(G(z)\) is a member of the generalised extreme value (GEV) family \[ G(z) = \exp \left\{ - \left[ 1 + \xi \left(\dfrac{z-\mu}{\sigma}\right) \right]_+^{-1 / \xi} \right\},\] where \([v]_+ = \max \left\lbrace 0,v \right\rbrace\), \(\mu \in \mathbb{R}\), \(\sigma \in \mathbb{R}^+\) and \(\xi \in \mathbb{R}\).

(Fisher and Tippett 1928, Gnendenko 1943)

Practicalities

Why approximate the \(\mathbb{P}(M_n \leq z)\) by the GEV distribution?

Practicalities

Why approximate the \(\mathbb{P}(M_n \leq z)\) by the GEV distribution?




Rainfall observations aren't independent




Practicalities

Why approximate the \(\mathbb{P}(M_n \leq z)\) by the GEV distribution?





Rainfall observations aren't independent




Rainfall observations aren't identically distributed

GEV Distribution

For the GEV: \(\mu\) is the location parameter, \(\sigma\) is the scale parameter, \(\xi\) is the shape parameter.

Shiny applications not supported in static R Markdown documents

Covariates

GEV Parameters (linear functions): \[ \mu = l_{\mu}(\hbox{geographic covariates, climate covariates})\] \[\sigma = l_{\sigma}(\hbox{geographic covariates, climate covariates})\] \[\xi = \hbox{constant}\]

Use Southern Oscillation Index (SOI) as a measure for ENSO strength.

Modelling

Want to make inference about the extremes of a rainfall field

Max-stable Processes

Max-stable Processes

  • Extremes in continuous space with dependence

  • Natural extension from univariate extreme value theory

  • Univariate marginal distributions are GEV distributions

  • Can simulate from these processes

Wettest Day of the Year

Definition

Let \(\{Z_i\}_{i \geq 1}\) be a sequence of independent copies of a stochastic process \(\{ Z(x) : x \in \mathcal{X} \subset \mathbb{R}^2 \}\).

The process \(Z(x)\) is max-stable, if there exist normalising functions, \(\{a_n(x)\} \in \mathbb{R}^+\) and \(\{b_n(x)\} \in \mathbb{R}\), such that \[ Z(x) \stackrel{d}{=} \lim_{n\to\infty} \dfrac{ \max _{i=1,\dots,n} Z_i(x) - b_{n}(x) }{ a_{n}(x) }, \quad x \in \mathcal{X}.\] If the limiting process for the partial maxima process exists and is non-degenerate, then it is a max-stable process.

(De Haan 2006)

Spectral Representation

Any non-degenerate simple max-stable process \(\{ Z(x): x \in \mathcal{X}\}\) defined on a compact set \(\mathcal{X} \subset \mathbb{R}^2\), with continuous sample paths satisfies \[ Z(x) \stackrel{d}{=} \max_{i \geq 1} \zeta_i Y_i(x), \quad\quad x \in \mathcal{X}, \] where \(\{\zeta_i: i \geq 1 \}\) are points of a Poisson process on \((0, \infty)\) with intensity \(\zeta^{-2}\hbox{d}\zeta\), and \(Y_i\) are independent copies of a non-negative stochastic process \(\{Y(x): x\in \mathcal{X}\}\) with continuous sample paths such that the \(\mathbb{E}\lbrace Y(x) \rbrace = 1\) for all \(x \in \mathcal{X}\).

(De Haan 1984, Schlather 2002)

Conditional Simulation

Original Questions

Was the Toowoomba flash flood more likely due to the 2010-2011 La Nina ?

Yes ~ 85% more likely compared with a El Nino year.

If we had a similar strength La Nina, what is the probability of this flash flood occurring?

1 in 7.5 (0.134)

(For details see Saunders et al. 2017.)

Question




Can we do this type of modelling for all of Australia?


Dependence

Dependence

Shiny applications not supported in static R Markdown documents

Challenge


Regionalisation

Clustering

Clustering Distance

Form clusters based on extremal dependence!
(Bernard et al 2013)

  • Use only the raw annual maxima
  • No information about climate or topography

Use the F-madogram distance (Cooley et al 2006) \[d(x_i, x_j) = \tfrac{1}{2} \mathbb{E} \left[ \left| F_i(M_{x_i}) - F_j(M_{x_j})) \right| \right]\] where \(M_{x_i}\) is the annual maximum rainfall at location \(x_i \in \mathbb{R}^2\) and \(F_i\) is the distribution function of \(M_{x_i}\).

This distance can be estimated non-parametrically.

Extremal Coefficient

For \(M_{x_i}\) and \(M_{x_j}\) with common GEV marginals, \(\theta(x_i - x_j)\) is \[\mathbb{P}\left( M_{x_i} \leq z, M_{x_j} \leq z \right) = \left[\mathbb{P}(M_{x_i}\leq z)\mathbb{P}(M_{x_i}\leq z)) \right]^{\tfrac{1}{2}\theta(x_i - x_j)}. %= \exp\left(\dfrac{-\theta(h)}{z}\right),\]

The range of \(\theta(x_i - x_j)\) is \([1 , 2]\).

Can write our distance measure as a function of the extremal coefficient, \(\theta(x_i - x_j)\), \[d(x_i, x_j) = \dfrac{\theta(x_i - x_j) - 1}{2(\theta(x_i - x_j) + 1)}.\]

Therefore the range of \(d(x_i, x_j)\) is \([0 , 1/6]\).

K-Medoids Clustering

Partitioning around Medoids (PAM): (Kaufman and Rousseeuw 1990)

  1. Randomly select an initial set of \(K\) stations. These are the set of the initial medoids.
  2. Assign each station, \(x_i\), to its closest medoid, \(m_k\), based on the F-madogram distance.
  3. For each cluster, \(C_k\), update the medoid according to \[m_k = \mathop{\mathrm{argmin}}\limits_{x_i \in C_k} \sum_{x_j \in C_k} d(x_i, x_j).\]
  4. Repeat steps 2. – 4. until the medoids are no longer updated.

Clustering

Spurious Clustering

Let \(d_e(x_i,x_j)\) be the Euclidean distance between x and y.
Consider the \(\max\{d_e(x_i,x_j), 2\}\) as the clustering distance.

Density Sensitive

Australian Rainfall Network

Hierarchical Clustering

  1. Each station starts in its own cluster
  2. For each pair of clusters, \(C_k\) and \(C_k'\), define the distance between the clusters as \[d(C_k, C_{k'}) = \frac{1}{|C_k| |C_{k'}|} \sum_{x_k \in C_k} \sum_{x_{k'} \in C_{k'}} d(x_k, x_{k'}).\]
  3. Merge the the clusters with the smallest distance
  4. Update the distances relative to the new cluster
  5. Repate steps 3 - 5, until all points are combined in a single cluster

Dendrogram

Revisit our examples

Revisit our examples

Examples

Classify

  • Classify a station relative to its closest neighbours
  • Use a weighted classification \(w\)-kNN

Regionalisation

Shiny applications not supported in static R Markdown documents

Choosing a cut height

Similar Dependence


Where can we assume a common dependence structure?

Australia

Conclusions

  • Max-stable models powerful tool for modelling extremes

  • Dependence of annual maxima in Australia is highly variable and highly localised

  • Exercise caution in our modelling assumptions

e. katerobinsonsaunders@gmail.com

t. @katerobsau

g. github.com/katerobsau